Chasing unicorns!

0 dataloss and 99.999% availability

Open Source Experience

Who am I

Image by Anemone123 from Pixabay

Chasing unicorns

  • About dataloss and downtime
  • CAP and PACELC theorems
  • Some typical architectures
Image by João Paulo from Pixabay

0 dataloss

  • Does not exist
  • Sorry :-)
  • Different meanings
Image by Pete Linforth from Pixabay

Allowed failure time

Availability needed daily weekly yearly
100% 0 0 0
99.999% <0.9s 6s <5min
99.99% <8s 1min 52min
99.9% 1.5min 10min 8h45min
99.999% during daytime 0.4s 3s <3min

Anythink can break

Image by WikiImages from Pixabay

Anythink can break

Image by WikiImages from Pixabay

Anythink can break

Image by WikiImages from Pixabay

CAP theorem

When designing distributed web services, there are three properties that are commonly desired: consisteny, availability, and partition tolerance. It is impossible to achieve all three.

CAP theorem

Consistency

Each node will see the last (committed) version of the data.
Image by Andrew Martin from Pixabay

Availability

The system will always answer something different than an error message.
Image by Andrew Martin from Pixabay

Network partitioning

The network will be allowed to lose arbitrarily many messages sent from one node to another without altering the system answers.

CAP theorem

  • impossibility to have all 3
  • you can secure 2
  • then stretch to the third
RDBMS are CA
⟶ True and False
RDBMS in single node configuration are CA
Image by Gerd Altmann from Pixabay
RDBMS are CA
⟶ True and False
RDBMS with asynchronous streaming replication are AP
Image by Gerd Altmann from Pixabay
RDBMS are CA
⟶ True and False
RDBMS with synchronous streaming replication are neither CP, neither AP
Image by Gerd Altmann from Pixabay

PACELC theorem

In case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C),
but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consisteny (C).
Image by congerdesign from Pixabay

Latency

Latency is the amount of time the system needs to answer when it's smaller than a request timeout.
Image by klimkin from Pixabay

Blueprints

  • PostgreSQL
  • These are only examples
  • Check the weakest link


A good architecture is a KISS architecture!
Image by Free-Photos from Pixabay

Just some vocabulary

  • Queen: database server who can read and write
  • Princess: database server who's waiting for the Queen to fall to take its place
  • Worker: database server who's only reading

Dataloss

RPO Architecture
≥ 15min Physical backup + WALs archiving
≥ 1min Physical backup + WALs archiving
+ Asynchronous princess
≥ 200ms Physical backup + WALs archiving
+ Synchronous princess (+ asynchronous princess)
0 Not possible 😏
Image by Free-Photos from Pixabay

High availability

RTO Architecture
≥ 24h Physical backup + WALs archiving (depending on the time to restore)
≥ 30min Princess + manual failover
≥ 5min Princess + automatic failover

Haute disponibilité

RTO Architecture
≥ 1min/30s Multi-queens
0 Not possible 😏
  • The database is not the only element that can fail
  • You are not Google
  • Figure out what your RPO is
  • Figure out what your RTO is
Image by TanteTati from Pixabay
Unicorns are more real than 0-dataloss and 99.999% availability.
Gülçin Yıldırım Jelínek, Prague, 2019-08-27
Any questions?
Image by Andrew Martin from Pixabay